Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring SQS Propagation in line with the spec #1673

Closed
wants to merge 2 commits into from

Conversation

tsloughter
Copy link
Member

Opening as a draft because 1 test still fails. I'm hoping someone more familiar with the code might notice what I did wrong easily :)

The failure is in TestBoto3SQSInstrumentation.test_receive_message, no links are found:

>           self.assertEqual(1, len(span.links))
E           AssertionError: 1 != 0

I did some simple printf debugging to check what the MessageSystemAttributes were before calling extract and the ctx after calling extract:

        message_system_attributes = message.get("MessageSystemAttributes", {})
        print(message_system_attributes)
        links = []
        ctx = AwsXRayPropagator().extract(
            message_system_attributes, getter=boto3sqs_getter
        )
        print(ctx)

And the header is there as expected and it is extracted properly but the ctx is then empty:

message_system_attributes:

{'AWSTraceHeader': {'StringValue': 'root=1-00000000-00000000000000000000000a;parent=0000000000000001;sampled=1', 'DataType': 'String'}}

Value in Boto3SQSGetter.get:

root=1-00000000-00000000000000000000000a;parent=0000000000000001;sampled=1

ctx:

{}

Description

The instrumentation spec for SQS defines that propagation is done with X-Ray to the key AWSTraceHeader in MessageSystemAttributes. This patch brings the propagation in line with the spec from what it was doing before of using the global propagator and MessageAttributes.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Existing tests were updated to extract/inject the propagated context with the new format.

Does This PR Require a Core Repo Change?

  • Yes. - Link to PR:
  • No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

@tylerbenson
Copy link
Member

@tsloughter does AWS support x-ray propagation on the message attribute level? I think in the Java instrumentation, the propagation is done via http headers on the outgoing request rather than the request body. I'm not sure if there's a difference in behavior.

@mariojonke
Copy link
Contributor

I think context propagation with MessageSystemAttributes will only work in a very limited way. If you search for MessageSystemAttributes in the AWS SDK docs for SQS, they are only defined for send_message and send_message_batch. For receive_message they are not returned by AWS, meaning that the propagated context cannot be extracted anymore when the SDK is used at the receiver side.
Context propagation via the XRay propagator only seems to work if the SQS receiver is a lambda function where AWS internally converts the MessageSystemAttributes to an XRay environment variable that the lambda function then can access.
For other scenarios like SQS.send_message -> SQS.receive_message, SNS.publish -> (SNS subscription to SQS) -> SQS.receive (either via SDK or lambda), ... the MessageSystemAttributes will become unaccessible and break context propagation.
I guess this is the reason why context propagation in this repo and in other techs like js or ruby propagate context via MessageAttributes instead of MessageSystemAttributes.

@tsloughter
Copy link
Member Author

Ah. I thought I saw them in the response body but I guess not.

My reading of the SQS spec was that they shouldn't be on regular attributes because of user limits there and instead to use system attributes, but if those aren't propagated to the user that doesn't help :)

@tsloughter
Copy link
Member Author

@tylerbenson I went looking and can't find where Java does this for SQS.

@tsloughter
Copy link
Member Author

Ah. It does look like Python is doing the same, https://github.com/open-telemetry/opentelemetry-python-contrib/blob/main/instrumentation/opentelemetry-instrumentation-botocore/src/opentelemetry/instrumentation/botocore/__init__.py#L116

So I could instead update that too always use XRay and remove the existing propagation in attributes..?

@tsloughter
Copy link
Member Author

Or wait, wouldn't putting it in headers break sending batches? Batches will have messages with separate trace contexts.

@tylerbenson
Copy link
Member

Good question. I don't know the answer to that. In theory we should be doing something similar to what AWS is doing for their sdks: https://docs.aws.amazon.com/xray/latest/devguide/xray-sdk-python-patching.html

@tsloughter
Copy link
Member Author

@mariojonke I noticed AWSTraceHeader is in the AttributeNames for receive_message https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sqs.html#SQS.Queue.receive_messages

Might that be the value from MessageSystemAttributes?

@mariojonke
Copy link
Contributor

It looks like it is the same thing but i haven't tested it yet. The instrumentation probably would need inject into the AttributeNames to tell AWS that it wants to receive the AWSTraceHeader and then extract it from the Attributes instead of the MessageAttributes.
However, as far as i can tell this only works for SQS to SQS scenarios and i'm not sure if it would require to explicitly enable XRay in AWS. When looking at the SNS docs there doesn't seem to be any mention of AWSTraceHeader or MessageSystemAttributes, so tracing SNS to SQS messages probably isn't working unless the XRay header is somehow magically injected (maybe into the underlying HTTP request).
I guess this would need to be tested and also if it works with enabeling/disabeling raw message delivery on the SNS to SQS subscription.

Another thing to consider i guess is that the XRay trace header only contains trace ID, span ID and sampling flag. This is more or less in line with the W3C traceparent but it doesn't seem to provide a tracestate or similar. So any vendor specific information will be lost when tracing an SQS message and using the XRay propagator.

@Ma-Ba
Copy link

Ma-Ba commented Feb 16, 2023

The proposed change will introduce an incompatibility with SNS and SQS tracing scenarios currently in use, that rely on the global propagator being called to inject/extract the context into message attributes.

See open-telemetry/opentelemetry-specification#3212 (review) for a more detailed explanation.

I therefore suggest to keep the call to the global propagator's inject and extract methods in place and add the X-Ray propagator in a backwards compatible way.

@tsloughter
Copy link
Member Author

Experimenting with the x-ray sdk and rereading the existing spec I'll be updating this PR to no longer inject the AWSTraceHeader attribute. This is added by AWS when the HTTP x-ray header is found in the send request. Extraction will look for AWSTraceHeader in the attributes (not the message attributes).

I still need to get this working with otel python boto and boto3sqs to make sure it works as I expect based on use of x-ray python sdk. I'll be asking about that on slack tomorrow :)

I'm not sure what to do about backwards compatibility.

@tsloughter
Copy link
Member Author

I'm going to close this and open a new PR only focused on botocore.

I still think changes are needed to SQS instrumentation, but not the ones in this PR. I'll likely make one in the near future that adds support for checking for AWSTraceHeader in the System Attributes. Injection should not be done how it is in this PR but instead left to the PR I'm making on botocore where the context is injected into the HTTP headers with x-ray.

@tsloughter tsloughter closed this Feb 17, 2023
@Oberon00
Copy link
Member

Oberon00 commented Feb 20, 2023

@tsloughter

I'll likely make one in the near future that adds support for checking for AWSTraceHeader

If you meant setting, this should not actually be needed: If you set the X-Ray HTTP header when sending an SQS message, AWS will automatically fill in the AWSTraceHeader message system attribute based on that.

@tsloughter
Copy link
Member Author

@Oberon00 right, it fills it in, but it needs to be checked for on the receiving side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants